AITopics | markov decision process and informativeness

Maximum Expected Hitting Cost of a Markov Decision Process and Informativeness of Rewards

Neural Information Processing SystemsDec-26-2025, 03:47:34 GMT

We propose a new complexity measure for Markov decision processes (MDPs), the maximum expected hitting cost (MEHC). This measure tightens the closely related notion of diameter [JOA10] by accounting for the reward structure. We show that this parameter replaces diameter in the upper bound on the optimal value span of an extended MDP, thus refining the associated upper bounds on the regret of several UCRL2-like algorithms. Furthermore, we show that potential-based reward shaping [NHR99] can induce equivalent reward functions with varying informativeness, as measured by MEHC. By analyzing the change in the maximum expected hitting cost, this work presents a formal understanding of the effect of potential-based reward shaping on regret (and sample complexity) in the undiscounted average reward setting. We further establish that shaping can reduce or increase MEHC by at most a factor of two in a large class of MDPs with finite MEHC and unsaturated optimal average rewards.

hitting cost, markov decision process and informativeness, name change, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.44)

Add feedback

Maximum Expected Hitting Cost of a Markov Decision Process and Informativeness of Rewards

Neural Information Processing SystemsAug-20-2025, 09:33:14 GMT

However, strikingly, this measure is independent of rewards and is a function of only the transitions.

average reward, mdp, reward function, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois > Cook County > Chicago (0.05)
North America > Canada (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.52)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.49)

Add feedback

Reviews: Maximum Expected Hitting Cost of a Markov Decision Process and Informativeness of Rewards

Neural Information Processing SystemsJun-1-2025, 23:54:00 GMT

This paper introduces a new complexity measure for MDPs called maximum expected hitting cost. Unlike the diameter measure is only a function of the transition dynamics, this new measure takes into account the reward dynamics as well. The authors show theoretically that under the same assumptions as previous authors who introduced diameter, this new measure is a tighter upper bound. Furthermore, they show the usefulness of this measure by showing that it can be used to better understand the informativeness of rewards when using potential based reward shaping and they prove theoretically that in a large class of MDPs potential based reward shaping is bounded by a multiplicative factor of 2 on their maximum expected hitting costs. I enjoyed reading this paper. I appreciated the structure that the authors used in this paper which first introduced all the necessary prior work (related to diameter) cosily but thoroughly enough before introducing their contributions.

contribution, diameter, markov decision process and informativeness, (8 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.52)

Technology:

Information Technology > Decision Support Systems (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.40)

Add feedback

Reviews: Maximum Expected Hitting Cost of a Markov Decision Process and Informativeness of Rewards

Neural Information Processing SystemsJun-1-2025, 23:53:49 GMT

The paper introduces a new complexity measure for MDPs, the expected hitting costs. In contrast to former complexity measures, the hitting costs also depend on the reward of the MDP and can provide a tighter bound for UCRL2. The theory also provides an intersting connection between reward shapeing and the complexity of a MDP. All reviewers appreciated the strong theoretical contribution of the paper which improves our theoretical understanding of the complexity of MDPs. The reviewers also liked that the paper is well written and establishes connections to reward shaping, a method that has also a highly practical value. All reviewers recommend acceptance and I agree with their assessment.

complexity measure, hitting cost, markov decision process and informativeness, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Decision Support Systems (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.40)

Add feedback

Maximum Expected Hitting Cost of a Markov Decision Process and Informativeness of Rewards

Neural Information Processing SystemsOct-11-2024, 06:22:46 GMT

We propose a new complexity measure for Markov decision processes (MDPs), the maximum expected hitting cost (MEHC). This measure tightens the closely related notion of diameter [JOA10] by accounting for the reward structure. We show that this parameter replaces diameter in the upper bound on the optimal value span of an extended MDP, thus refining the associated upper bounds on the regret of several UCRL2-like algorithms. Furthermore, we show that potential-based reward shaping [NHR99] can induce equivalent reward functions with varying informativeness, as measured by MEHC. By analyzing the change in the maximum expected hitting cost, this work presents a formal understanding of the effect of potential-based reward shaping on regret (and sample complexity) in the undiscounted average reward setting.

decision support system, machine learning, markov decision process and informativeness, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Decision Support Systems (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.66)

Add feedback

Maximum Expected Hitting Cost of a Markov Decision Process and Informativeness of Rewards

Dai, Falcon, Walter, Matthew

Neural Information Processing SystemsMar-18-2020, 23:45:59 GMT

We propose a new complexity measure for Markov decision processes (MDPs), the maximum expected hitting cost (MEHC). This measure tightens the closely related notion of diameter [JOA10] by accounting for the reward structure. We show that this parameter replaces diameter in the upper bound on the optimal value span of an extended MDP, thus refining the associated upper bounds on the regret of several UCRL2-like algorithms. Furthermore, we show that potential-based reward shaping [NHR99] can induce equivalent reward functions with varying informativeness, as measured by MEHC. By analyzing the change in the maximum expected hitting cost, this work presents a formal understanding of the effect of potential-based reward shaping on regret (and sample complexity) in the undiscounted average reward setting.

hitting cost, markov decision process and informativeness, potential-based reward, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Decision Support Systems (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.66)

Add feedback

Maximum Expected Hitting Cost of a Markov Decision Process and Informativeness of Rewards

Dai, Falcon Z., Walter, Matthew R.

arXiv.org Machine LearningJul-3-2019

We propose a new complexity measure for Markov decision processes (MDP), the maximum expected hitting cost (MEHC). This measure tightens the closely related notion of diameter [JOA10] by accounting for the reward structure. We show that this parameter replaces diameter in the upper bound on the optimal value span of an extended MDP, thus refining the associated upper bounds on the regret of several UCRL2-like algorithms. Furthermore, we show that potential-based reward shaping [NHR99] can induce equivalent reward functions with varying informativeness, as measured by MEHC. We further establish that shaping can reduce or increase MEHC by at most a factor of two in a large class of MDPs with finite MEHC and unsaturated optimal average rewards.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Machine Learning

1907.02114

Country: North America > United States (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.49)

Add feedback

Filters

Collaborating Authors

markov decision process and informativeness

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Maximum Expected Hitting Cost of a Markov Decision Process and Informativeness of Rewards

Maximum Expected Hitting Cost of a Markov Decision Process and Informativeness of Rewards

Reviews: Maximum Expected Hitting Cost of a Markov Decision Process and Informativeness of Rewards

Reviews: Maximum Expected Hitting Cost of a Markov Decision Process and Informativeness of Rewards

Maximum Expected Hitting Cost of a Markov Decision Process and Informativeness of Rewards

Maximum Expected Hitting Cost of a Markov Decision Process and Informativeness of Rewards

Maximum Expected Hitting Cost of a Markov Decision Process and Informativeness of Rewards